Clustering Files with Extended File Attributes in Metadata

نویسندگان

  • Lin Han
  • Hao Huang
  • Changsheng Xie
  • Wei Wang
چکیده

Classification and searching play an important role in modern file systems and file clustering is an effective approach to do this. This paper presents a new labeling system by making use of the Extended File Attributes [1] of file system, and a simple file clustering algorithm based on this labeling system is also introduced. By regarding attributes and attribute-value pairs as labels of files, features of a file can be represented as binary vectors of labels. And some well-known binary vector dissimilarity measures can be performed on this binary vector space, so clustering based on these measures can be done also. This approach is evaluated with several real-life datasets, and results indicate that precise clustering of files is achieved at an acceptable cost.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extended File System Metadata Management with Relational Databases

Modern file systems need to handle extended metadata. Existing file systems are not equipped to handle managing metadata in the amount of files and the diversity of files that these file systems are now supporting. Users need better searching and querying capabilities. Metadata within files still remain applicationand file format-specific and is often proprietary, which makes searching difficul...

متن کامل

Design and Implementation of a Metadata-Rich File System

Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file syste...

متن کامل

Using Provenance to Extract Semantic File Attributes

Rich, semantically descriptive file attributes are valuable in many contexts, such as semantic namespaces and desktop search. Descriptive attributes help users to find files placed in seemingly-arbitrary locations by different applications. However, extracting semantic attributes from file contents is nontrivial. An alternative is to examine file provenance: how and when files are used, and the...

متن کامل

SmartStore: A New Metadata Organization Paradigm with Semantic-Awareness

Fast and flexible metadata retrieving is critical in the nextgeneration data storage systems. As the storage capacity approaches the Exabyte level and the stored files number is in the billions, directory-tree based metadata management widely deployed in conventional file systems can no longer meet the requirements of scalability and functionality. At the same time, new I/O interfaces are of gr...

متن کامل

Cmpsci 677 Operating Systems 20.1 Stand-alone (unix) File System 20.2 Distributed File System

Files are stored as uninterpreted sequence of bytes. Data inside the file is not associated with the file type. Directory comprises of file (table) that stores names of other files, metadata(inode), devices associated with those files. Inodes store file attributes and a multi-level index(levels of pointers) that has a list of disk block locations for the file. To read the file we first need to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Multimedia

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014